{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "![alt text](http://i64.tinypic.com/as8k4.jpg “Title”)  \n",
    "## [mlcourse.ai](https://mlcourse.ai/)   Open Machine learning course\n",
    "\n",
    "<center> **Author:Natalia Domozhirova, slack: @ndomozhirova** <center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# <center>Tutorial</center>\n",
    "# <center>KERAS : easy way to construct the Neural Networks</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "![alt text](http://i63.tinypic.com/35mpimt.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Keras is a high-level neural networks API, written in Python.\n",
    "\n",
    "Major Keras features:\n",
    "- its  capable of running on top of TensorFlow, CNTK, or Theano.\n",
    "- Keras allows for easy and fast prototyping and supports both Perceptrons, Convolutional networks and Recurrent networks (including LSTM), as well as their combinations. \n",
    "- Keras is compatible with: Python 2.7-3.6."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To make the process more interesting let's consider the classification example  from the real life.\n",
    "\n",
    "## Example description \n",
    "\n",
    "Let's take the task from one Hakaton, organized by some polypropylene producer this year. So, let’s consider the production of the polypropylene granules by the extruder. Extruder is a kind of “meat grinder” which has the knives at the end of the process which are cutting the output product onto the granules.  \n",
    "The problem is that sometimes the production mass has an irregular consistency and sticks to the knives. When there is a lot of stuck mass - knives can no longer function. In this case  it is necessary to stop production process, which is very expensive. If we would catch the very beginning of such sticking process -  there is a  method to very quickly and painless  clean the knives and continue production without stopping.\n",
    "So, the task is to send the stop signal to operator a bit in advance (let’s say not later then 15 minutes before such event) – so that he would have a time for necessary manipulations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "<img src=\"http://i68.tinypic.com/2rr2glg.jpg\" style=\"height:250px\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we have already preprocessed normalized dataset with the vectors of the system sensors' values  (5,160 features) and 0/1 targets. It is already devided into the [train](https://drive.google.com/open?id=1TMlClLguxcXTOAJt8VKe-iLrndJuFShl) and [test](https://drive.google.com/open?id=1JonMu0wmMbUqcbSd17Qr2A3AhVF3nutZ). \n",
    "Let's download and prepare to work our datasets. In the datasets there are targets in zero column and the timestamps -in the 1st column.So, let's extract our train and test matrix as well as our targets. Also we'll transform the targets to categorical -so to have as a result our targets as  2-dimentional vectors, i.e. the vectors of probabilities of 0 and 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "from keras.utils import np_utils\n",
    "\n",
    "df_train = pd.read_csv(\"train2.tsv\", sep=\"\\t\", header=None)\n",
    "df_test = pd.read_csv(\"test2.tsv\", sep=\"\\t\", header=None)\n",
    "\n",
    "Y_train = np.array(df_train[0].values.astype(int))\n",
    "Y_test = np.array(df_test[0].values.astype(int))\n",
    "X_train = np.array(df_train.iloc[:, 2:].values.astype(float))\n",
    "X_test = np.array(df_test.iloc[:, 2:].values.astype(float))\n",
    "\n",
    "Y_train = Y_train.astype(np.int)\n",
    "Y_test = Y_test.astype(np.int)\n",
    "\n",
    "Y_train = np_utils.to_categorical(Y_train, 2)\n",
    "Y_test = np_utils.to_categorical(Y_test, 2)\n",
    "\n",
    "print(X_train.shape)\n",
    "print(Y_train.shape)\n",
    "print(X_test.shape)\n",
    "print(Y_test.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Neural Network construction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's consider how the simple Newral Network(NN), Multilayer Perceptron (MLP), with 3 hidden layers (as a baseline), constructed by Keras, could help us to solve this problem.\n",
    "\n",
    "As we have hidden layers - this would be a Deep Neural Network. \n",
    "Also, we can see, that we need to have 5160 neurons in the input layer, as this is the size of our vector X and 2 neurons in the last layer - as this is the size of our target (vs. the picture below, where there are 4 neurons on the output layer).\n",
    "You can read, for example, [here](https://en.wikipedia.org/wiki/Multilayer_perceptron) or [here](https://towardsdatascience.com/meet-artificial-neural-networks-ae5939b1dd3a) some more information about MLP structure. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "<img src=\"http://i66.tinypic.com/2d6tsm.jpg\" style=\"height:250px\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The core data structure of Keras is a  **_model_** - a way to organize layers. The simplest type of model is the **_Sequential_** model, a linear stack of layers, which is appropriate for MLP construction (for more complex architectures, you should use the Keras functional API, which allows to build arbitrary graphs of layers)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import keras\n",
    "from keras import Sequential\n",
    "\n",
    "model1 = Sequential()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After the type of model is defined, we need to consistently add layers as **_Dense_**. Stacking layers is as easy as **_.add()_**.\n",
    "\n",
    "While adding the layer we need to define the **number of neurons** and ***Activation*** **functions** which we can tune afterwards. For the fist layer we also need to add the dimention of X vectors (***input_dim***). In our case this is 5,160. The last layer consists on 2 neurons exactly as our target vestors Y_train and Y_test do.\n",
    "\n",
    "The **number of layers** can also be tuned."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from keras.layers import Activation, Dense\n",
    "\n",
    "model1.add(Dense(64, input_dim=5160))\n",
    "model1.add(Activation(\"relu\"))\n",
    "\n",
    "model1.add(Dense(64))\n",
    "model1.add(Activation(\"sigmoid\"))\n",
    "\n",
    "model1.add(Dense(128))\n",
    "model1.add(Activation(\"tanh\"))\n",
    "\n",
    "model1.add(Dense(2))\n",
    "model1.add(Activation(\"softmax\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once our model looks good, we need to configure its learning process with **_.compile()_**.\n",
    "\n",
    "Here we should also describe the **_loss_** **function** and **_metrics_** we want to use as well as **_optimizer_** (the type of the Gradient descent to be used) which seem appropriate in each particular case."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model1.compile(loss=\"categorical_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can iterate on our training data in batches with the ***batch_size*** we want, where X_train and y_train are Numpy arrays just like in the Scikit-Learn API.\n",
    "Also we can define the number of ***epochs*** (i.e. the max number of the full cycles of model's training). \n",
    "***Verbose=1*** just lets us see the summary of the current stage of calcualtions.\n",
    "\n",
    "We can also printing our model parameters using ***model.summary()***. \n",
    "It is also can be useful to see the **shapes** of X_train, y_train,X_test,y_test\n",
    "\n",
    "Also, we can save **the best model version** during the trainig process via the ***callback_save*** option.\n",
    " \n",
    "And there is a ***callback_early stop*** option to stop the training process when we don't have significant improvement(defined by the ***min_delta***) during the certain number of epochs (***patience***). \n",
    " \n",
    "Now our first model is ready:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from keras.callbacks import EarlyStopping, ModelCheckpoint\n",
    "\n",
    "model1.summary()\n",
    "\n",
    "print(X_train.shape)\n",
    "print(Y_train.shape)\n",
    "print(X_test.shape)\n",
    "print(Y_test.shape)\n",
    "\n",
    "callback_save = ModelCheckpoint(\n",
    "    \"best_model1.model1\", monitor=\"val_acc\", verbose=1, save_best_only=True, mode=\"auto\"\n",
    ")\n",
    "callback_earlystop = EarlyStopping(\n",
    "    monitor=\"val_loss\", min_delta=0, patience=10, verbose=1, mode=\"auto\"\n",
    ")\n",
    "\n",
    "\n",
    "model1.fit(\n",
    "    X_train,\n",
    "    Y_train,\n",
    "    batch_size=20,\n",
    "    epochs=10000,\n",
    "    verbose=1,\n",
    "    validation_data=(X_test, Y_test),\n",
    "    callbacks=[callback_save, callback_earlystop],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**So, we got a baseline with Accuracy= 0.79**. It looks cool, as we even didn't tune anything yet!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's try to improve this result. For example, we can introduce **Dropout** - this is a kind of regularization for the Neral Networks.\n",
    "The **level of drop out** (in the brackets, along with a ***seed***) is a probability of the exclusion from the certain layer the random choice neuron during the current calculations. So, drop outs help to prevent the NN overfitting.  \n",
    "Let's create the new model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from keras.layers import Dropout\n",
    "\n",
    "model2 = Sequential()\n",
    "\n",
    "model2.add(Dense(64, input_dim=5160))\n",
    "model2.add(Activation(\"relu\"))\n",
    "model2.add(Dropout(0.3, seed=123))\n",
    "\n",
    "model2.add(Dense(64))\n",
    "model2.add(Activation(\"sigmoid\"))\n",
    "model2.add(Dropout(0.4, seed=123))\n",
    "\n",
    "model2.add(Dense(128))\n",
    "model2.add(Activation(\"tanh\"))\n",
    "model2.add(Dropout(0.5, seed=123))\n",
    "\n",
    "model2.add(Dense(2))\n",
    "model2.add(Activation(\"softmax\"))\n",
    "\n",
    "\n",
    "model2.compile(loss=\"categorical_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"])\n",
    "\n",
    "model2.summary()\n",
    "\n",
    "print(X_train.shape)\n",
    "print(Y_train.shape)\n",
    "print(X_test.shape)\n",
    "print(Y_test.shape)\n",
    "\n",
    "callback_save = ModelCheckpoint(\n",
    "    \"best_model2.model2\", monitor=\"val_acc\", verbose=1, save_best_only=True, mode=\"auto\"\n",
    ")\n",
    "callback_earlystop = EarlyStopping(\n",
    "    monitor=\"val_loss\", min_delta=0, patience=10, verbose=1, mode=\"auto\"\n",
    ")\n",
    "\n",
    "\n",
    "model2.fit(\n",
    "    X_train,\n",
    "    Y_train,\n",
    "    batch_size=20,\n",
    "    epochs=10000,\n",
    "    verbose=1,\n",
    "    validation_data=(X_test, Y_test),\n",
    "    callbacks=[callback_save, callback_earlystop],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Thus, adding the drop-outs we've **increased Accuracy on the test up to 0.86830**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also **tune all gyper-parameters** like the **number of layers**, the **levels of drop-outs**, **activation functions**, **optimizer**, **the number of neurons** etc.\n",
    "For this purposes we can use, for example, another very friendly and easy-to-apply  - Hyperas library. The description with examples you can find [here](https://github.com/maxpumperla/hyperas).\n",
    "As a result of such tuning we've got the following model configuration:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model3 = Sequential()\n",
    "\n",
    "model3.add(Dense(64, input_dim=5160))\n",
    "model3.add(Activation(\"relu\"))\n",
    "model3.add(Dropout(0.11729755246044238, seed=123))\n",
    "\n",
    "model3.add(Dense(256))\n",
    "model3.add(Activation(\"relu\"))\n",
    "model3.add(Dropout(0.8444244099007299, seed=123))\n",
    "\n",
    "model3.add(Dense(1024))\n",
    "model3.add(Activation(\"linear\"))\n",
    "model3.add(Dropout(0.41266207281071243, seed=123))\n",
    "\n",
    "model3.add(Dense(256))\n",
    "model3.add(Activation(\"relu\"))\n",
    "model3.add(Dropout(0.4844455237320119, seed=123))\n",
    "\n",
    "model3.add(Dense(2))\n",
    "model3.add(Activation(\"softmax\"))\n",
    "\n",
    "model3.compile(\n",
    "    loss=\"categorical_crossentropy\", optimizer=\"rmsprop\", metrics=[\"accuracy\"]\n",
    ")\n",
    "\n",
    "model3.summary()\n",
    "\n",
    "print(X_train.shape)\n",
    "print(Y_train.shape)\n",
    "print(X_test.shape)\n",
    "print(Y_test.shape)\n",
    "\n",
    "callback_save = ModelCheckpoint(\n",
    "    \"best_model3.model3\", monitor=\"val_acc\", verbose=1, save_best_only=True, mode=\"auto\"\n",
    ")\n",
    "callback_earlystop = EarlyStopping(\n",
    "    monitor=\"val_loss\", min_delta=0, patience=10, verbose=1, mode=\"auto\"\n",
    ")\n",
    "\n",
    "model3.fit(\n",
    "    X_train,\n",
    "    Y_train,\n",
    "    batch_size=60,\n",
    "    epochs=10000,\n",
    "    verbose=1,\n",
    "    validation_data=(X_test, Y_test),\n",
    "    callbacks=[callback_save, callback_earlystop],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Now, with tunned parameters, we've managed to imporove Accuracy up to 0.88073**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With Keras it is also possible to use  **L1/L2 weight regularizations** which allow to apply penalties on layer parameters or layer activity during optimization. These penalties are incorporated in the loss function that the network optimizers.Let's add some regularization on to the 1st layer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from keras import regularizers\n",
    "\n",
    "model4 = Sequential()\n",
    "model4.add(\n",
    "    Dense(\n",
    "        64,\n",
    "        input_dim=5160,\n",
    "        kernel_regularizer=regularizers.l2(0.0015),\n",
    "        activity_regularizer=regularizers.l1(0.0015),\n",
    "    )\n",
    ")\n",
    "model3.add(Activation(\"relu\"))\n",
    "model4.add(Dropout(0.11729755246044238, seed=123))\n",
    "\n",
    "model4.add(Dense(256))\n",
    "model4.add(Activation(\"relu\"))\n",
    "model4.add(Dropout(0.8444244099007299, seed=123))\n",
    "\n",
    "model4.add(Dense(1024))\n",
    "model4.add(Activation(\"linear\"))\n",
    "model4.add(Dropout(0.41266207281071243, seed=123))\n",
    "\n",
    "model4.add(Dense(256))\n",
    "model4.add(Activation(\"relu\"))\n",
    "model4.add(Dropout(0.4844455237320119, seed=123))\n",
    "\n",
    "model4.add(Dense(2))\n",
    "model4.add(Activation(\"softmax\"))\n",
    "\n",
    "model4.compile(\n",
    "    loss=\"categorical_crossentropy\", optimizer=\"rmsprop\", metrics=[\"accuracy\"]\n",
    ")\n",
    "\n",
    "model4.summary()\n",
    "\n",
    "print(X_train.shape)\n",
    "print(Y_train.shape)\n",
    "print(X_test.shape)\n",
    "print(Y_test.shape)\n",
    "\n",
    "callback_save = ModelCheckpoint(\n",
    "    \"best_model4.model4\", monitor=\"val_acc\", verbose=1, save_best_only=True, mode=\"auto\"\n",
    ")\n",
    "callback_earlystop = EarlyStopping(\n",
    "    monitor=\"val_loss\", min_delta=0, patience=10, verbose=1, mode=\"auto\"\n",
    ")\n",
    "\n",
    "model4.fit(\n",
    "    X_train,\n",
    "    Y_train,\n",
    "    batch_size=60,\n",
    "    epochs=10000,\n",
    "    verbose=1,\n",
    "    validation_data=(X_test, Y_test),\n",
    "    callbacks=[callback_save, callback_earlystop],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, we can see, that adding regualrization with the current coeffitients to the firs layer we've got just **Accuracy of 0.84421** which didn't improve the result. This means, that, as usual, they should be carefully tuned :) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When we **want to use the best trained model** we got, we can just download previously (automatically) saved the best one (via ***load_model***) and apply to the data needed.\n",
    "Let's see what we'll get on the test set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from keras.models import load_model\n",
    "\n",
    "model = load_model(\"best_model3.model3\")\n",
    "result = model.predict_on_batch(X_test)\n",
    "result[:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You may also be interested to get a **list of all weight tensors** of the model, as Numpy arrays via ***get_weights***:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "weights = model.get_weights()\n",
    "weights[:1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Besides, you would propbably like to get the **model config** to re-use it in the future. This can be done via ***get_config***:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "config = model.get_config()\n",
    "config"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, the model can be **reinstantiated** from its config via ***from_config***:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model3 = Sequential.from_config(config)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For more model tuning options proposed by Keras pls see [here](https://keras.io/)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What about the other types of the Neural Networks?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Yes, you can use the similar approach re the layers' construction principles for LSTM, CNN and some other types of the Deep Neural Networks. For more details pls see [here](https://keras.io/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References:\n",
    "1. https://keras.io/\n",
    "2. https://towardsdatascience.com/meet-artificial-neural-networks-ae5939b1dd3a\n",
    "3. https://www.quantinsti.com/blog/installing-keras-python-r\n",
    "4. https://livebook.manning.com/#!/book/deep-learning-with-python/chapter-7\n",
    "5. https://github.com/hyperopt/hyperopt"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
